AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Document Image Parsing

# Document Image Parsing

VL3 SigLIP NaViT
Apache-2.0
The visual encoder for VideoLLaMA3, utilizing Arbitrary Resolution Visual Tokenization (AVT) technology to dynamically process images and videos of different resolutions.
Text-to-Image Transformers English
V
DAMO-NLP-SG
25.55k
8
Model3
MIT
Document image understanding model fine-tuned based on naver-clova-ix/donut-base-finetuned-cord-v2
Image-to-Text Transformers
M
sunilsai
13
0
Donut Base Medical Handwritten Blocks Data Extraction
MIT
A model based on the Donut architecture, specifically designed for extracting structured data from medical handwritten documents
Text Recognition Transformers
D
mjawadazad2321
15
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase